Automatic Construction of Korean Verbal Type Hierarchy using Treebank
نویسندگان
چکیده
The lexical information of verbal lexemes, such as verbs and adjectives, plays an important role in syntactic parsing, because the structure of a sentence mainly hinges on the type of verbal lexemes. The question we address in this research is how to acquire the ‘argument structure’ (henceforth ARG-ST) of verbal lexemes in Korean. It is well known that manual build-up of type hierarchy usually cost too much time and resources, so an alternative method, namely automatic collection of relevant information is much more preferred. This paper proposes a procedure to automatically collect ARG-ST of Korean verbal lexemes from a Korean Treebank. Specifically, the system we develop in this paper first extracts lexical information of ARGST of verbal lexemes from a 0.8 million graphic word Korean Treebank in an unsupervised way, checks the hierarchical relationship among them, and builds up the type hierarchy automatically. The result is written in an HPSG-style annotation, thus making it possible to readily implement the result in an HPSG-based parser for Korean. Finally, the result is evaluated with reference to two Korean dictionaries and also with respect to a manually constructed type hierarchy.
منابع مشابه
Automatic acquisition of “noun+verb” idiomatic compounds in Korean*14
Song, Sanghoun. 2015. Automatic acquisition of “noun+verb” idiomatic compounds in Korean. Linguistic Research 32(1), 253-280. The state-of-the-art skills of computational linguistics pay attention to lexical semantics, because it has a potential to be used to improve language processing systems in terms of coverage as well as accuracy. In particular, utilizing multiword expressions is important...
متن کاملStatistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing
This paper gives two contributions to dependency parsing in Korean. First, we build a Korean dependency Treebank from an existing constituent Treebank. For a morphologically rich language like Korean, dependency parsing shows some advantages over constituent parsing. Since there is not much training data available, we automatically generate dependency trees by applying head-percolation rules an...
متن کاملWide-Coverage Grammar Extraction from Thai Treebank
Parsing is an important step for natural language understanding, including phrase alignment for supporting statistical machine translation. Ability on analysing real text by parser strongly depends on grammar. Treebank could be one of the sources for grammar extraction. However, treebank construction largely relies on human annotators intuitions. Different intuitions from multiple annotators br...
متن کاملDistributional regularities of verbs and verbal adjectives: Treebank evidence and broader implications
Word formation processes such as derivation and compounding yield realizations of lexical roots in different parts of speech and in different syntactic environments. Using verbal adjectives as a case study and treebanks of Dutch and German as data sources, similarities and divergences in syntactic distributions across different realizations of lexical roots are examined and the implications for...
متن کاملA Korean Noun Semantic Hierarchy (Wordnet) Construction
Since thesaurus is used as a knowledge resource in many natural language processing systems, it is very useful and necessary for the high quality systems, especially for dealing with semantics. In this paper, we introduce a semi-automatic method for the construction of Korean noun semantic hierarchy by utilizing a monolingual MRD and an existing thesaurus.
متن کامل